National Repository of Grey Literature 32 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Email spam filtering using artificial intelligence
Safonov, Yehor ; Uher, Václav (referee) ; Kolařík, Martin (advisor)
In the modern world, email communication defines itself as the most used technology for exchanging messages between users. It is based on three pillars which contribute to the popularity and stimulate its rapid growth. These pillars are represented by free availability, efficiency and intuitiveness during exchange of information. All of them constitute a significant advantage in the provision of communication services. On the other hand, the growing popularity of email technologies poses considerable security risks and transforms them into an universal tool for spreading unsolicited content. Potential attacks may be aimed at either a specific endpoints or whole computer infrastructures. Despite achieving high accuracy during spam filtering, traditional techniques do not often catch up to rapid growth and evolution of spam techniques. These approaches are affected by overfitting issues, converging into a poor local minimum, inefficiency in highdimensional data processing and have long-term maintainability issues. One of the main goals of this master's thesis is to develop and train deep neural networks using the latest machine learning techniques for successfully solving text-based spam classification problem belonging to the Natural Language Processing (NLP) domain. From a theoretical point of view, the master's thesis is focused on the e-mail communication area with an emphasis on spam filtering. Next parts of the thesis bring attention to the domain of machine learning and artificial neural networks, discuss principles of their operations and basic properties. The theoretical part also covers possible ways of applying described techniques to the area of text analysis and solving NLP. One of the key aspects of the study lies in a detailed comparison of current machine learning methods, their specifics and accuracy when applied to spam filtering. At the beginning of the practical part, focus will be placed on the e-mail dataset processing. This phase was divided into five stages with the motivation of maintaining key features of the raw data and increasing the final quality of the dataset. The created dataset was used for training, testing and validation of types of the chosen deep neural networks. Selected models ULMFiT, BERT and XLNet have been successfully implemented. The master's thesis includes a description of the final data adaptation, neural networks learning process, their testing and validation. In the end of the work, the implemented models are compared using a confusion matrix and possible improvements and concise conclusion are also outlined.
A Classification of a Syndicated Content
Matušov, Izidor ; Očenášek, Pavel (referee) ; Smrčka, Aleš (advisor)
This work deals with a classification of a syndicated content as the possible way of organizing the content. The classification uses algorithms for natural language processing. The main contribution is applying word sense disambiguation algorithm for enhancing the classification, eliminating the learning stage, and using a readability test for improving user experience. The application is implemented as an extensible server-client model. The future work is discussed in the end.
Cleaning, extraction of text and transformation of web pages into vertical format
Švaňa, Miloš ; Otrusina, Lubomír (referee) ; Dytrych, Jaroslav (advisor)
This thesis deals with the topic of extraction of text from web page, recognition of important contents and its transformation to vertical format, which can be used as a suitable input for other natural language processing tasks. It analyzes the existing solution and its components with emphasis on its disadvantages and describes the design and implementation of new solution based on obtained knowledge.
Scala Programming Language and Its Use for Data Analysis
Kohout, Tomáš ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor)
This thesis deals with comparing the Scala programming language with other commonly used languages for data analysis. These languages are evaluated on the basis of the following categories: data manipulation and visualization, machine learning and concurent processing capabilities. The evaluation then shows the strengths and weaknesses of Scala. The strengths will be demonstrated on application for email categorization.
Intelligent Mailbox
Pohlídal, Antonín ; Drozd, Michal (referee) ; Chmelař, Petr (advisor)
This master's thesis deals with the use of text classification for sorting of incoming emails. First, there is described the Knowledge Discovery in Databases and there is also analyzed in detail the text classification with selected methods. Further, this thesis describes the email communication and SMTP, POP3 and IMAP protocols. The next part contains design of the system that classifies incoming emails and there are also described realated technologie ie Apache James Server, PostgreSQL and RapidMiner. Further, there is described the implementation of all necessary components. The last part contains an experiments with email server using Enron Dataset.
Extraction of Semantic Relations from Text
Pospíšil, Milan ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor)
Today exists many semi-structured documents, whitch we want convert to structured form. Goal of this work is create a system, that make this task more automatized. That could be difficult problem, because most of these documents are not generated by computer, so system have to tolerate differences. We also need some semantic understanding, thats why we choose only domain of meeting minutes documents.
Actual Events Tracker
Odstrčilík, Martin ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
The goal of the master thesis project was to develop an application for tracking of actual events in the surrounding area of the users. This application should allow the users to view events, create new events and add comments to existing ones. Beyond the implementation of developed application, this project deals with an analysis of the presented problem. The analysis includes a comparison with existing solutions and search for available technologies and frameworks applicable for implementation. Another part inside this work is description of the theory in behind of data classification that is internally used for event and comment analysis. This work also includes a design of appliction including design of user interface, software architecture, database, communication protocol and data classifiers. The main part of this project, the implementation, is described aftewards. At the end of this work, there is a summary of the whole process and also there are given some ideas about enhancing the application in the future.
Adaptive RSS Reader
Luža, Jindřich ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
Purpose of this balcheor thesis is posibility to enhance common RSS reader by extension, which allowing user filter RSS feed depends on that's classification by content to groups.There is discussed problems in common classification and in text classification. Forth, there is reveal teoretical aspect of RSS format, which is needed to be considered in implementation of RSS reader module and prototype of module. At last, testing of used classifier is stated here.
Artificial Intelligence Document Classification
Molnár, Ondřej ; Kačic, Matej (referee) ; Třeštíková, Lenka (advisor)
This paper deals with document classification using artificial intelligence. It describes the principles of classification and machine learning. It also introduces AI methods and presents Naive Bayes classification method in detail. Provides practical implementation of the classifier in MS Office and discusses other possible extensions.
Machine Learning Text Classifier for Short Texts Category Prediction
Drápela, Karel ; Křena, Bohuslav (referee) ; Šimková, Hana (advisor)
This thesis deals with categorization of short spam texts from SMS messages. First part summarizes current methods for text classification and~it's followed by description of several commonly used classifiers. In following chapters test data analysis, program implementation and results are described. The program is able to predict text categories based on predefined set of classes and also estimate classification accuracy on training data. For the two category types, that I designed, classifier reached accuracy of 82% and 92% . Both preprocessing and feature selection had a positive impact on resulting accuracy. It is possible to improve this accuracy further by removing portion of samples, which are difficult to classify. With 80\% recall it is possible to increase accuracy by 8-10%.

National Repository of Grey Literature : 32 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.